Assignment 3 - Part 1

Assignments
Author

Khushi

Introduction

We are investigating spending behaviour between genders.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(mosaic)
Registered S3 method overwritten by 'mosaic':
  method                           from   
  fortify.SpatialPolygonsDataFrame ggplot2

The 'mosaic' package masks several functions from core packages in order to add 
additional features.  The original behavior of these functions should not be affected by this.

Attaching package: 'mosaic'

The following object is masked from 'package:Matrix':

    mean

The following objects are masked from 'package:dplyr':

    count, do, tally

The following object is masked from 'package:purrr':

    cross

The following object is masked from 'package:ggplot2':

    stat

The following objects are masked from 'package:stats':

    binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
    quantile, sd, t.test, var

The following objects are masked from 'package:base':

    max, mean, min, prod, range, sample, sum
library(ggformula)
library(crosstable)

Attaching package: 'crosstable'

The following object is masked from 'package:purrr':

    compact
library(skimr)

Attaching package: 'skimr'

The following object is masked from 'package:mosaic':

    n_missing
library(dplyr)
library(broom)

Research Experiment to Investigate Spending Behaviour Between Genders

Objective:
The purpose of this research experiment is to investigate whether there is a significant difference in the amount of money spent between guys and girls among students at MAHE Bengaluru. The study aims to test the hypothesis that guys tend to spend more money than girls, by collecting and analyzing spending data during a specific period.

Hypothesis:

  • Guys spend more money on average compared to girls within a given timeframe.

Experiment Design:
Data Collection:

  • Sample: The dataset consists of spending data from 60 students at MAHE Bengaluru, with 30 guys and 30 girls. The participants were randomly selected through coin tosses and asked to record their spending on October 23rd, 2024.

  • Data Sources: The data was recorded in an Excel sheet by the people conducting the experiment. Each participant reported how much money they spent on the allocated date.

Variables Measured:

  • Target Variable: Amount of money spent by each participant during the experiment period.

  • Predictor Variable: Gender (guys vs. girls)

Sampling:

  • Time Period: The experiment was conducted on October 24th, 2024, over a period of 1.5 hours.

  • Participant Selection Criteria: 30 guys and 30 girls were randomly selected by tossing a coin.

Analysis Plan:

  • Data Cleaning and Transformation: The data was checked for accuracy in the Excel sheet, ensuring that each participant’s spending was properly recorded and that there were no missing or erroneous entries.

  • Exploratory Data Analysis (EDA): Visualizations like histograms and box plots will be used to compare spending distributions between guys and girls, in order to identify any patterns or discrepancies.

Statistical Tests:

  • Two-Sample t-Test: A two-independent sample test for means has to be conducted to compare the average amount spent by guys and girls. This helps in determining if there is a statistically significant difference in spending behaviour between the two groups.
  • Permutation Test: A permutation test has to be conducted to assess the likelihood of the observed difference in spending occurring by chance, providing a non-parametric alternative to the two-sample t-test.

  • Descriptive Statistics: Measures like mean, median, and standard deviation have to be calculated for both groups to summarize spending behaviour.

  • Wilcoxon Rank-Sum Test: This is a non-parametric test that has to be conducted to compare the spending behaviour between guys and girls.( if the normality assumption is not met )

Limitations:

  • Sample Size: With 30 participants per gender, the results may not be generalizable to a larger population. A more extensive sample would have provided greater reliability.

  • Contextual Factors: The study did not account for reasons behind spending (e.g., necessities vs. discretionary purchases), which might have influenced spending patterns.

Outcome:
The analysis will reveal whether a significant difference in spending exists between guys and girls at MAHE Bengaluru. If the hypothesis holds, it would suggest that guys, on average, spend more in a limited timeframe. These findings will provide insights into gender-specific spending behaviour within this student population.

Dataset - Spending

spending_23rd <- read_csv("../../data/Pocketmoney.csv")
Rows: 82 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Name, Gender
dbl (2): Sr no, Money Spent

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
spending_23rd
# A tibble: 82 × 4
   `Sr no` Name     Gender `Money Spent`
     <dbl> <chr>    <chr>          <dbl>
 1       1 Aagam    Male             150
 2       2 Aakash   Male             240
 3       3 Aarushi  Female           382
 4       4 Abheeta  Female            60
 5       5 Adithya  Male              68
 6       6 Aditya   Male             300
 7       7 Akanksha Female           270
 8       8 Amruta   Female           190
 9       9 Anaaya   Female           300
10      10 Anish    Male               0
# ℹ 72 more rows

The table provides information on the spending behaviour of individuals on October 23rd, categorized by gender. It highlights the amount spent by 82 participants, allowing for a comparison of spending patterns between guys and girls. The data provides insight into the variability in spending amounts, as well as individual outliers, offering a basis for further analysis of gender-based financial behaviour.

spending_modified <- spending_23rd %>% 
  dplyr::mutate( Gender = as_factor(Gender)) 

Glimpse - Spending Behaviour

glimpse(spending_modified)
Rows: 82
Columns: 4
$ `Sr no`       <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
$ Name          <chr> "Aagam", "Aakash", "Aarushi", "Abheeta", "Adithya", "Adi…
$ Gender        <fct> Male, Male, Female, Female, Male, Male, Female, Female, …
$ `Money Spent` <dbl> 150, 240, 382, 60, 68, 300, 270, 190, 300, 0, 250, 85, 7…

The columns include a unique identifier for each participant (Sr no), their Name, Gender (male or female), and the Money Spent. The dataset contains numeric data for the money spent, with a wide range of amounts from 0 to significant expenditures, allowing for analysis of spending patterns between males and females.

Inspect - Spending Behaviour

inspect(spending_modified)

categorical variables:  
    name     class levels  n missing
1   Name character     82 82       0
2 Gender    factor      2 82       0
                                   distribution
1 Aagam (1.2%), Aakash (1.2%) ...              
2 Male (50%), Female (50%)                     

quantitative variables:  
         name   class min     Q1 median     Q3   max     mean         sd  n
1       Sr no numeric   1  21.25   41.5  61.75    82  41.5000   23.81526 82
2 Money Spent numeric   0 100.00  264.5 596.25 13000 720.9634 1835.72169 82
  missing
1       0
2       0

The gender distribution is evenly split, with no missing data in the dataset. The summary statistics for the Money Spent variable show a wide range of spending, with a minimum value of 0 rupees and a maximum value of 13,000 rupees. The median spending is 264.5 rupees, while the mean spending is 720.9 rupees, indicating that a few individuals spent significantly more than others. The standard deviation of 1,835.3 rupees reflects a high level of variability in spending behavior. These results suggest that while most participants had moderate spending, a few high spenders significantly influenced the overall average.

Skim - Spending Behaviour

skim(spending_modified)
Data summary
Name spending_modified
Number of rows 82
Number of columns 4
_______________________
Column type frequency:
character 1
factor 1
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Name 0 1 3 12 0 82 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Gender 0 1 FALSE 2 Mal: 41, Fem: 41

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Sr no 0 1 41.50 23.82 1 21.25 41.5 61.75 82 ▇▇▇▇▇
Money Spent 0 1 720.96 1835.72 0 100.00 264.5 596.25 13000 ▇▁▁▁▁

The key variable of interest, Money Spent, shows a mean spending of 720.96 rupees, with a high variability indicated by a standard deviation of 1,835.7 rupees. The spending ranges from 0 to 13,000 rupees, with a median of 264.5 rupees. The distribution is heavily skewed, with most participants spending below the median, but a few high spenders significantly contributing to the total amount, leading to the large spread in spending behavior.

Data Dictionary

Quantitative Data:

  • Money Spent (dbl): The total amount of money spent by each participant on October 23rd, recorded in Indian rupees.

Qualitative Data:

  • Sr no (fct): A sequential number assigned to each participant for identification purposes (treated as a categorical factor).

  • Name (chr): The name of the participant.

  • Gender (chr): The gender of the participant, categorized as either Male or Female.

Histogram - Distribution of Money Spent by Gender

gf_histogram(~ `Money Spent` | Gender, 
             data = spending_modified, 
             binwidth = 200, 
             fill = "skyblue", 
             color = "black") %>% 
  gf_labs(
    title = "Distribution of Money Spent by Gender",
    x = "Money Spent (in rupees)",
    y = "Count"
  )

The histograms comparing money spent by gender reveal that both males and females exhibit a similar spending pattern, with the majority of participants spending relatively small amounts, primarily below 1,000 rupees. There is a notable concentration of individuals in both groups who spent minimal amounts (0 to 500 rupees), with a few significant outliers who spent as much as 13,000 rupees. The distribution is right-skewed for both genders, indicating that while most participants spent modestly, a few individuals in each group spent considerably more. Overall, the spending behaviour shows similar trends between males and females, with only slight variations in the frequency of higher spending outliers.

Bar Chart - Average Money Spent by Gender

gf_bar(
  mean(`Money Spent`) ~ Gender,
  data = spending_modified,
  fill = ~ Gender
) %>% 
  gf_labs(
    title = "Average Money Spent by Gender",
    x = "Gender",
    y = "Average Money Spent (in rupees)"
  ) 
Warning: Ignoring unknown aesthetics: .

The bar chart compares the average money spent by gender, showing that both males and females have nearly identical average spending, with only a slight difference. The mean amount spent by males and females is approximately 721 rupees, indicating that, on average, there is minimal variation in spending behaviour between the two genders in this dataset. This suggests that gender does not significantly influence the average amount of money spent on October 23rd.

Boxplot - Distribution of Money Spent by Gender

gf_boxplot(`Money Spent` ~ Gender, 
           data = spending_modified, 
           fill = ~ Gender) %>% 
  gf_labs(
    title = "Distribution of Money Spent by Gender",
    x = "Gender",
    y = "Money Spent (in rupees)"
  ) 

The box plot comparing the distribution of money spent between males and females shows that both genders have a similar range of spending, with most participants spending relatively low amounts. The median spending is low for both genders, as indicated by the position of the boxes. However, there are a few notable outliers in both groups, with some individuals spending significantly higher amounts, particularly around 10,000 rupees for males. These outliers create a noticeable extension in the upper range of the male box plot. Overall, the distribution suggests that while typical spending is modest, a few individuals in both groups account for much higher spending.

Scatter Plot - Individual Money Spent by Gender

gf_point(`Money Spent` ~ Gender, 
         data = spending_modified,
         color = ~ Gender) %>% 
  gf_labs(
    title = "Individual Money Spent by Gender",
    x = "Gender",
    y = "Money Spent (in rupees)"
  ) 

The scatter plot displays individual spending points for both males and females, highlighting the distribution of money spent on October 23rd. Most participants, regardless of gender, spent relatively small amounts, clustering around the lower end of the plot. However, there are noticeable outliers in both groups, with a few individuals spending significantly higher amounts—one female and one male spending close to 13,000 rupees. These outliers are clearly separated from the main group, indicating that while the general spending pattern is modest, a few individuals contribute to a much higher level of spending.

Density Plot of Money Spent

spending_modified %>%
  gf_density(~ `Money Spent`, fill = "gray", alpha = 0.5) %>%
  gf_fitdistr(~ `Money Spent`, dist = "dnorm") %>%
  gf_labs(
    title = "Density Plot of Money Spent",
    subtitle = "Compared with Normal Distribution",
    x = "Money Spent (in rupees)",
    y = "Density"
  )

The actual spending data is highly right-skewed, with a majority of participants spending small amounts, as indicated by the sharp peak near zero. The density drops off quickly, with a long tail extending toward higher spending amounts, reflecting a few individuals who spent significantly more. The fitted normal distribution deviates from the actual data, confirming that the spending data does not follow a normal distribution, particularly due to the extreme outliers and the overall skewness. This suggests that parametric tests assuming normality may not be appropriate without data transformation. ( If the data is not normally distributed, the results could be unreliable.)

Log-Transformed Distribution of Money Spent

spending_modified <- spending_modified %>%
  mutate(log_money_spent = log(`Money Spent` + 1))

gf_density(~ log_money_spent, data = spending_modified) %>% 
  gf_labs(
    title = "Log-Transformed Distribution of Money Spent",
    x = "Log of Money Spent",
    y = "Density"
  )

The density plot of the log-transformed spending data shows a more symmetric distribution compared to the original skewed data. After applying the log transformation, the previously right-skewed data is now centered with a peak around a log value of 4 to 5, suggesting a more normalized spread of spending behavior. While the log transformation has reduced the extreme effects of outliers and compressed the range of higher values, the data still exhibits some variation but appears more suitable for parametric testing.

The t-test

t <- t.test(
  `Money Spent` ~ Gender, 
  data = spending_modified, 
  mu = 0, 
  alternative = "two.sided", 
  conf.int = TRUE, 
  conf.level = 0.95
) %>% 
broom::tidy()

t
# A tibble: 1 × 10
  estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
     <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>    <dbl>     <dbl>
1     55.3      749.      693.     0.136   0.893      76.5    -757.      868.
# ℹ 2 more variables: method <chr>, alternative <chr>

The results of the Welch Two-Sample t-test comparing the average money spent by males and females show a t-value of 0.13555 and a p-value of 0.8925, which is much larger than the typical significance threshold of 0.05. This indicates that there is no statistically significant difference in the mean spending between the two groups. The 95% confidence interval for the difference in means ranges from -757.07 to 867.66, further suggesting that the difference could be zero. The mean spending for males was 748.61 rupees, while the mean for females was 693.32 rupees, but this difference is not statistically significant based on the t-test results.

Wilcoxon’s Signed-Rank Test

wilcox_test <- wilcox.test(
  `Money Spent` ~ Gender, 
  data = spending_modified, 
  alternative = "two.sided", 
  conf.int = TRUE, 
  conf.level = 0.95
) %>% 
broom::tidy()
Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
compute exact p-value with ties
Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
compute exact confidence intervals with ties
wilcox_test
# A tibble: 1 × 7
  estimate statistic p.value conf.low conf.high method               alternative
     <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr>                <chr>      
1     55.0      936.   0.381    -70.0      180. Wilcoxon rank sum t… two.sided  

The Wilcoxon Signed-Rank Test results indicate that the p-value is 0.3806, which is greater than the common significance threshold of 0.05. This suggests that there is no statistically significant difference in the median spending between males and females. The test statistic is 935.5, with a 95% confidence interval for the difference in spending between groups ranging from -69.99 to 180. The estimate of the difference in ranks is 55.00, further indicating that the observed difference in spending behavior is not large enough to be statistically significant. Therefore, based on this test, we cannot conclude that there is a significant difference in spending between males and females.

Permutation Test

perm_test_result <- do(1000) * diffmean(
  shuffle(`Money Spent`) ~ Gender, 
  data = spending_modified
)
perm_test_result
         diffmean
1    -311.3414634
2     633.2439024
3     671.0487805
4      84.4146341
5    -686.6097561
6     145.0000000
7    -195.9268293
8     577.1463415
9    -125.8292683
10    373.0487805
11    -56.5121951
12   -510.1219512
13    618.6097561
14     98.7073171
15    145.9756098
16    135.8780488
17   -660.0243902
18   -303.5853659
19   -821.2439024
20    167.5853659
21     75.1463415
22    421.6341463
23   -614.1219512
24    -21.8780488
25   -330.7073171
26   -588.4634146
27    745.4878049
28   -103.0487805
29   -114.8536585
30     72.3658537
31    -37.4390244
32   -174.4146341
33   -198.3170732
34    637.7317073
35     27.8292683
36    -73.9268293
37    128.3170732
38    449.6341463
39    702.2682927
40   -470.0731707
41   -140.4634146
42    134.8048780
43    466.1219512
44   -151.0000000
45    561.7804878
46   -661.3414634
47   -434.8048780
48    170.3658537
49   -485.0975610
50    546.6097561
51   -420.2195122
52   -703.7317073
53    526.2682927
54    -88.3170732
55    540.6585366
56   -354.6585366
57     68.2682927
58   -168.2682927
59   -205.4390244
60    522.1707317
61    547.4390244
62    131.0975610
63   -527.9268293
64    346.4634146
65    572.2195122
66    107.7804878
67   -419.0487805
68    330.0731707
69    -21.7317073
70    348.4146341
71    629.0975610
72    -93.3414634
73    601.2926829
74    284.1707317
75   -434.0731707
76     -8.3170732
77    208.1707317
78   -122.1219512
79   -540.5609756
80   -718.2195122
81   -252.4634146
82   -541.9268293
83   -354.6097561
84    287.5365854
85   -713.3902439
86    462.3658537
87    156.0243902
88    434.3658537
89     77.4878049
90    608.7560976
91   -128.8048780
92    -42.3658537
93    -37.8292683
94   -781.8780488
95    475.3414634
96   -752.4634146
97   -358.9024390
98   -470.1707317
99    -30.9024390
100  -570.4146341
101   567.4878049
102    14.9512195
103   721.8292683
104   -46.8048780
105   482.5609756
106  -642.6585366
107  -269.1951220
108   487.0975610
109   517.7317073
110  -634.3658537
111   266.5121951
112   295.4390244
113  -740.2682927
114  -385.1951220
115    80.5121951
116  -669.7317073
117   345.4878049
118    75.6341463
119    49.0487805
120   343.7804878
121    54.8048780
122   435.5853659
123  -653.9756098
124  -683.7804878
125   257.5365854
126   645.5365854
127  -444.2682927
128  -749.9268293
129   821.1951220
130    56.5121951
131   472.0243902
132  -254.4634146
133     8.2682927
134  -761.2439024
135  -602.1707317
136  -174.0243902
137  -229.3414634
138  -593.9268293
139   210.3658537
140  -592.8048780
141  -372.8048780
142   751.7317073
143   -34.5609756
144   703.2926829
145   722.7560976
146  -378.0243902
147  -122.3170732
148    58.6097561
149   -42.4146341
150   716.2195122
151    47.1463415
152  -553.6341463
153   596.4634146
154   669.0000000
155  -721.2926829
156  -187.8292683
157    39.9268293
158   -55.7317073
159  -341.1463415
160   -85.0000000
161   144.4146341
162   395.2926829
163   664.6097561
164  -581.5853659
165    27.2439024
166   362.0731707
167    21.1951220
168  -314.6585366
169    10.6097561
170  -777.6341463
171   217.1951220
172   106.0731707
173  -206.4634146
174   288.8536585
175  -213.0487805
176   -24.6097561
177   -40.0243902
178  -263.5365854
179   754.1219512
180    34.3170732
181   267.9756098
182   124.0731707
183   425.5853659
184   580.4634146
185   495.0000000
186   518.5121951
187   100.7073171
188   461.4878049
189   -15.6341463
190   616.1219512
191   767.3414634
192   193.2926829
193  -107.0000000
194    74.9024390
195   548.4146341
196   344.3170732
197   584.0731707
198  -540.8536585
199  -630.3170732
200  -162.0731707
201  -595.0487805
202  -493.2439024
203  -333.0487805
204  -681.2926829
205   456.2195122
206  -145.7317073
207   -88.1219512
208   707.5853659
209  -298.0243902
210  -732.3658537
211  -523.9756098
212  -712.9024390
213   -24.4634146
214  -270.4146341
215  -101.7804878
216   368.7560976
217  -190.5121951
218  -177.7804878
219    71.2439024
220  -638.2195122
221  -114.3170732
222  -588.7073171
223   716.6097561
224   503.1951220
225  -123.4390244
226  -610.4634146
227   190.5609756
228   122.7560976
229    27.5853659
230   448.1219512
231  -381.6341463
232   114.1219512
233  -634.7560976
234   101.4878049
235    82.0243902
236   487.9756098
237   191.1463415
238  -709.0487805
239  -100.9024390
240   495.5853659
241  -526.1219512
242    15.0487805
243   149.1463415
244  -635.4878049
245  -595.0000000
246   170.7073171
247  -720.5609756
248  -110.1219512
249   713.3414634
250   251.5365854
251   598.6097561
252   -20.1707317
253  -339.0487805
254   -50.8048780
255  -651.2439024
256  -186.6585366
257  -477.2439024
258  -157.9756098
259   486.7073171
260   -93.2439024
261    24.5609756
262  -191.0487805
263  -198.2195122
264   276.2195122
265  -254.2682927
266   159.8780488
267   309.2439024
268   219.0000000
269   294.8536585
270   387.7317073
271  -617.9756098
272  -180.0731707
273   496.4634146
274  -182.7560976
275   410.1707317
276   179.8292683
277   559.5853659
278   606.7560976
279   -26.1707317
280  -485.8292683
281     0.7073171
282  -817.7317073
283  -375.7804878
284  -189.4878049
285    77.6341463
286    79.6341463
287    83.8780488
288   495.0487805
289     9.1951220
290   186.4634146
291    32.1219512
292   484.1707317
293   165.2439024
294  -563.2926829
295  -301.1951220
296   705.1951220
297   -70.8536585
298  -520.1219512
299  -680.9512195
300    97.7804878
301  -466.6097561
302  -194.9024390
303   -62.1219512
304   312.7073171
305  -352.7560976
306  -250.7560976
307  -641.0975610
308   176.0243902
309  -101.0487805
310  -166.4146341
311   362.5121951
312  -450.3170732
313   172.6585366
314  -465.8780488
315  -464.0243902
316   525.4878049
317   666.5121951
318   543.6829268
319   588.5121951
320  -107.3414634
321    41.6341463
322   277.6829268
323    81.0487805
324  -334.3658537
325   -95.0000000
326   -90.3658537
327    52.4634146
328   762.4146341
329  -118.6097561
330  -495.5853659
331  -452.2195122
332  -107.3414634
333  -173.2439024
334    83.4878049
335   386.9512195
336  -359.4390244
337   297.4390244
338   673.0000000
339  -290.3170732
340   631.1463415
341  -137.5365854
342   365.4390244
343   184.0243902
344  -194.7073171
345  -155.3902439
346   604.2682927
347   526.5121951
348   486.0243902
349  -207.6341463
350   713.6341463
351  -489.9268293
352  -177.9268293
353   341.9756098
354   -39.8780488
355   153.5853659
356   239.9756098
357   649.6829268
358   -73.7804878
359  -104.4146341
360    67.6341463
361    39.0487805
362    18.3170732
363   -58.0243902
364  -471.2439024
365   297.0487805
366    71.7804878
367    -9.4390244
368  -492.7073171
369  -144.1707317
370   726.9024390
371   128.8048780
372   509.7804878
373  -250.6585366
374   -78.0731707
375  -176.4146341
376  -301.4878049
377  -526.8048780
378   301.0487805
379    20.5121951
380   406.6097561
381  -277.3414634
382    99.3414634
383  -251.4878049
384    -7.8780488
385   297.5365854
386   -48.0731707
387  -533.7317073
388   555.2439024
389  -106.0731707
390   461.2439024
391    55.4390244
392   144.2195122
393  -498.5609756
394   828.7073171
395  -117.9268293
396    58.5121951
397   -36.3658537
398   755.1951220
399  -134.5609756
400   158.5609756
401    -4.0731707
402   117.7804878
403  -114.5121951
404  -589.4878049
405    90.0731707
406  -837.1951220
407   444.7560976
408   362.2682927
409   152.1219512
410  -591.1463415
411   318.7073171
412  -544.0731707
413  -371.4390244
414    26.2682927
415   306.2682927
416   -87.8292683
417  -466.8536585
418   769.0487805
419   -84.7073171
420    24.4634146
421   688.5609756
422  -174.0243902
423    73.6829268
424  -105.6829268
425  -369.5365854
426    72.1707317
427   235.9268293
428   622.9024390
429   -41.6829268
430   553.0975610
431  -485.0000000
432   -35.2926829
433   -35.4878049
434   -88.3170732
435  -534.1707317
436  -224.8048780
437   509.8292683
438  -698.3658537
439  -716.4146341
440   203.3902439
441   287.6341463
442  -630.1707317
443   139.8780488
444    76.4146341
445  -358.5121951
446   407.4390244
447  -670.0243902
448    81.2439024
449  -147.8780488
450  -168.5121951
451  -608.7073171
452    96.8536585
453   659.0975610
454  -595.3902439
455   313.2439024
456   144.4634146
457  -677.2439024
458   454.6097561
459   502.0731707
460  -342.0243902
461   546.6585366
462   -32.9024390
463  -557.6341463
464   -70.4146341
465  -269.0000000
466   761.0975610
467  -723.8292683
468   510.5121951
469   231.0975610
470   391.3902439
471   226.1707317
472   149.6341463
473  -525.6341463
474  -605.5853659
475   679.8780488
476   375.5365854
477   699.5365854
478   212.6585366
479  -431.9756098
480  -700.7073171
481   -93.6341463
482  -431.5365854
483   -48.2195122
484   111.3902439
485  -681.4878049
486  -134.4146341
487   514.0243902
488  -390.9024390
489   661.7804878
490   174.7560976
491   -49.0975610
492   -10.4146341
493  -318.8048780
494    -6.9024390
495   112.3658537
496   651.6341463
497   635.7804878
498  -186.0731707
499   351.2439024
500  -613.7804878
501  -586.8536585
502  -314.7560976
503  -179.0975610
504   610.3170732
505    49.0487805
506    94.8048780
507   104.2195122
508    89.4878049
509   249.6829268
510  -575.1951220
511   280.4146341
512   364.3170732
513   118.5121951
514  -596.1707317
515  -681.0975610
516  -167.1951220
517  -788.7560976
518   486.4146341
519  -188.4146341
520  -220.6585366
521   348.4634146
522    49.9756098
523    -0.5121951
524  -153.8780488
525    76.7073171
526  -169.4390244
527  -234.6097561
528   113.3902439
529  -384.6097561
530   236.9024390
531   696.3170732
532   137.0975610
533    21.1463415
534   445.6829268
535   235.0487805
536   632.2682927
537   490.1707317
538   541.2926829
539   248.3658537
540  -587.1951220
541  -542.9512195
542   338.8536585
543  -161.8780488
544  -125.8292683
545  -487.7804878
546   858.1707317
547   413.5365854
548    46.9512195
549  -495.1463415
550  -461.7804878
551    93.0975610
552   102.4634146
553   751.1951220
554    79.0975610
555  -491.1463415
556   -23.6829268
557   122.7560976
558   686.6585366
559   -44.0731707
560   201.2439024
561    59.8780488
562  -600.1707317
563   128.6097561
564   789.4390244
565    34.9024390
566   -20.4146341
567   356.9024390
568  -510.0243902
569  -173.9268293
570   463.5365854
571  -738.6585366
572   621.2926829
573   -27.7317073
574   199.1951220
575    33.0487805
576   611.0000000
577  -138.0731707
578  -157.0487805
579   273.5853659
580   267.2926829
581  -414.4634146
582   139.5365854
583    58.3170732
584   426.5121951
585    95.4390244
586   169.4390244
587   367.5853659
588   -11.2926829
589   -45.1463415
590  -400.8536585
591  -559.3414634
592   442.5609756
593  -721.7317073
594   672.5121951
595    22.8048780
596   168.4146341
597   102.2682927
598  -587.8292683
599  -683.5365854
600    39.8780488
601   430.1707317
602   712.5121951
603   549.7317073
604   454.1219512
605   161.9756098
606  -470.5609756
607  -178.2195122
608   691.1463415
609  -524.9024390
610   821.7317073
611   -48.5609756
612   -52.7560976
613   596.9512195
614    75.6829268
615   714.8536585
616  -119.7317073
617   -70.0731707
618   279.6341463
619  -107.1463415
620  -193.5853659
621    98.1707317
622   267.0000000
623  -108.8048780
624   -82.7073171
625  -678.1707317
626   729.0975610
627   163.8292683
628   334.6097561
629   345.8780488
630   230.8048780
631   109.9268293
632  -153.3414634
633   109.8780488
634   247.3414634
635   516.2682927
636   436.2195122
637    24.9024390
638  -649.8780488
639   316.4634146
640   -55.2439024
641   138.9024390
642  -149.9756098
643    30.0731707
644  -220.5121951
645   762.9512195
646   605.3414634
647   156.3658537
648  -720.4634146
649   -10.9512195
650   726.5121951
651  -179.0487805
652  -498.3658537
653  -565.8780488
654  -291.8292683
655   728.5609756
656   -61.4390244
657  -226.3658537
658  -487.3414634
659   341.1951220
660  -245.7804878
661  -644.1219512
662   230.7560976
663   -90.5609756
664   585.6829268
665  -271.8780488
666  -779.4390244
667  -281.3902439
668    73.1951220
669   -35.4390244
670   463.7317073
671   468.6097561
672  -161.9268293
673  -559.4878049
674  -183.3414634
675  -248.2195122
676   682.3170732
677  -385.6829268
678   -13.9756098
679   737.2439024
680   201.6341463
681   437.3414634
682  -683.3902439
683  -313.7804878
684  -232.7073171
685    -3.3902439
686   -25.9268293
687   -42.6585366
688   414.6585366
689   107.5853659
690  -444.4634146
691   239.5365854
692  -684.1707317
693  -267.5365854
694   505.6341463
695   660.5121951
696  -135.0000000
697   656.2195122
698    76.6097561
699   558.0243902
700  -809.1463415
701   -64.7073171
702   271.7804878
703  -129.2926829
704  -505.6829268
705  -275.1463415
706   -45.8292683
707  -651.8780488
708   186.5609756
709   596.3170732
710   -78.5609756
711     1.0975610
712   690.0243902
713   647.6341463
714    88.5121951
715  -684.1219512
716  -449.1463415
717   -37.9756098
718  -460.7073171
719   423.9268293
720   -98.9512195
721   111.8780488
722   453.5853659
723   343.7317073
724  -186.8536585
725    71.2926829
726   547.6829268
727  -135.1463415
728   397.4878049
729   555.2926829
730   554.7073171
731   -80.6585366
732   702.5609756
733  -575.7317073
734   -97.1951220
735   -33.1951220
736   561.1463415
737  -569.4390244
738   473.1951220
739   -85.4878049
740   338.2195122
741   707.5365854
742    81.1463415
743   514.0243902
744  -343.6341463
745   616.8048780
746   431.8780488
747  -175.9756098
748  -242.3658537
749  -154.8048780
750  -587.5853659
751  -218.5609756
752  -219.7804878
753    16.3170732
754   599.4878049
755   250.8048780
756  -172.4634146
757   -69.8780488
758   -94.3170732
759  -348.9024390
760   266.9024390
761   100.8048780
762   664.6097561
763   121.6341463
764  -593.9268293
765  -168.1219512
766   -22.8536585
767  -158.4146341
768  -527.0975610
769   481.0487805
770   556.2682927
771   103.1951220
772     2.5121951
773    80.7073171
774    61.1951220
775   -52.5121951
776  -287.5853659
777    86.0243902
778  -122.3170732
779  -359.5853659
780   846.1707317
781   493.6341463
782   304.0731707
783   274.4634146
784  -125.8780488
785    55.9268293
786   568.6097561
787   -83.0487805
788  -148.3658537
789  -279.5365854
790  -466.2682927
791    49.4878049
792   504.5609756
793    30.0243902
794   355.6341463
795  -399.5853659
796   230.1219512
797  -105.3414634
798   -77.2439024
799   542.0243902
800  -456.3658537
801   449.8780488
802   116.7073171
803   -44.4634146
804  -483.8292683
805   608.9512195
806  -430.0731707
807  -744.5121951
808   593.5365854
809    35.9268293
810   361.1951220
811  -693.2926829
812   151.9268293
813  -586.5121951
814  -582.7560976
815   581.7317073
816   214.4634146
817   -70.4146341
818   247.1463415
819  -293.2439024
820  -727.1463415
821    80.8536585
822  -239.3902439
823   -47.5365854
824   -78.1219512
825  -163.5853659
826   712.9512195
827   451.2926829
828   712.4634146
829    91.0000000
830  -267.8292683
831   311.4878049
832   -74.1219512
833  -706.5121951
834    92.7560976
835  -719.0975610
836   107.7317073
837   595.3902439
838   328.3658537
839   129.3414634
840    48.3658537
841   485.1951220
842     4.2682927
843    14.9512195
844  -502.5121951
845  -310.0243902
846   403.0000000
847  -166.9024390
848  -257.9268293
849    47.4878049
850  -674.0243902
851   186.1219512
852    51.4390244
853   543.1951220
854  -253.7804878
855   299.8292683
856  -325.6341463
857   -69.8292683
858   449.4878049
859   816.4634146
860   283.3902439
861  -548.5121951
862   133.3414634
863  -727.6829268
864   467.4390244
865   388.7073171
866   439.5853659
867  -134.9024390
868  -624.5609756
869   569.3902439
870   353.0487805
871  -530.0731707
872   483.7317073
873  -441.4390244
874  -289.0487805
875   162.8048780
876  -209.5853659
877  -562.1707317
878    79.1463415
879   357.9756098
880  -483.9268293
881   154.8048780
882  -214.4146341
883   448.9024390
884   104.0731707
885   -93.4878049
886   701.2926829
887    -1.0487805
888  -496.7073171
889  -692.7560976
890   128.9024390
891   676.2682927
892  -251.0975610
893   707.3414634
894   567.3414634
895    35.5853659
896   295.5365854
897   638.0731707
898   782.0731707
899   -88.5121951
900  -403.3414634
901   463.0487805
902  -366.3658537
903  -681.7317073
904  -558.3658537
905   235.6341463
906  -226.2195122
907  -460.0243902
908   418.8536585
909   -30.6585366
910  -508.7073171
911    55.6341463
912    38.8048780
913   180.6097561
914  -206.2682927
915   167.4878049
916   667.3902439
917   149.6829268
918  -567.6341463
919   -73.8292683
920  -625.8292683
921    55.3414634
922   484.9512195
923  -452.4634146
924  -271.4878049
925  -469.1951220
926   633.5853659
927   -77.0000000
928   -71.5853659
929  -112.5121951
930   377.5853659
931  -272.9024390
932  -338.6097561
933   714.5609756
934   394.4146341
935   157.0487805
936  -264.6585366
937  -762.9512195
938    95.1951220
939   569.4878049
940   563.5365854
941   779.6341463
942   524.4634146
943  -581.1951220
944   587.5853659
945   152.9024390
946  -757.8780488
947    44.3658537
948  -102.8536585
949   -80.4146341
950  -108.8048780
951  -606.2195122
952  -777.4878049
953   674.4146341
954   428.4634146
955   578.9512195
956   487.5365854
957   132.3170732
958  -203.3414634
959    83.0000000
960    84.4146341
961  -529.6341463
962   -71.1463415
963   139.9756098
964  -501.1951220
965   551.0487805
966   505.4390244
967   144.6097561
968  -124.6097561
969  -643.4390244
970  -683.2926829
971   -46.5121951
972  -265.6341463
973  -592.1707317
974   204.0731707
975   -49.9756098
976    68.3170732
977  -135.6341463
978  -249.6341463
979    31.8292683
980   449.5853659
981   447.2439024
982   103.9756098
983   468.1219512
984   664.6585366
985   676.5121951
986   710.3170732
987   -78.1707317
988  -430.8048780
989   551.5365854
990    79.1951220
991    14.4634146
992     9.4878049
993   569.5853659
994   151.7317073
995     2.3658537
996  -529.4878049
997   397.8780488
998   -80.9512195
999   103.2926829
1000  -53.4878049

The results of the permutation test display 1,000 shuffled differences in mean spending between males and females. These differences have both positive and negative values. The negative values indicate cases where, after shuffling, the group representing females had higher average spending, while the positive values indicate that the male group had higher average spending in the shuffled data.

Plotting the results of the Permutation Test

gf_histogram(~ diffmean, data = perm_test_result, fill = "lightblue", binwidth = 5) %>%
  gf_labs(
    title = "Permutation Test for Difference in Spending",
    subtitle = "Permutation Distribution of Mean Spending Differences",
    x = "Difference in Mean Spending (Rupees)",
    y = "Frequency"
  )

The plot of the permutation test results shows the distribution of 1,000 differences in mean spending between males and females after randomly shuffling group labels. The differences range from approximately -350 to +800 rupees, with the most frequent differences clustering around zero. This suggests that, under random shuffling, the difference in mean spending is often small and distributed symmetrically around zero, indicating no strong directional bias between the two groups.

The observed difference in means

observed_diff <- diffmean(`Money Spent` ~ Gender, data = spending_modified)
observed_diff
 diffmean 
-55.29268 

The observed difference in mean spending between males and females is approximately -55.29 rupees, indicating that, on average, females spent slightly more than males. This negative value suggests a small difference in spending behaviour, but it is relatively modest.

Calculating the p-value

p_value <- prop(~ (abs(diffmean) >= abs(observed_diff)), data = perm_test_result)
p_value
prop_TRUE 
    0.889 

The permutation test resulted in a p-value of 0.871, indicating that there is an 87.1% probability that the observed difference in mean spending between males and females (approximately -55.29 rupees) could occur by random chance. Given this high p-value, we fail to reject the null hypothesis, which suggests that there is no statistically significant difference in spending behaviour between males and females in this dataset. In other words, the observed difference is not extreme enough to conclude that the two groups spend differently at a statistically significant level.

Summary of Results for the Spending Behaviour Research Experiment

The research experiment aimed to determine if there was a significant difference in spending behavior between male and female students at MAHE Bengaluru. The following statistical tests were conducted:

  1. Two-Sample t-Test (Welch’s t-test):

    • The two-sample t-test comparing the mean spending between males and females resulted in a p-value of 0.8925. This indicates no statistically significant difference in the average spending between the two groups, as the p-value is much higher than the typical threshold of 0.05.
  2. Wilcoxon Signed-Rank Test:

    • The Wilcoxon test, a non-parametric alternative to the t-test, resulted in a p-value of 0.3806, which similarly suggests no significant difference between the two groups in terms of spending behavior. This result supports the findings of the t-test.
  3. Permutation Test:

    • The permutation test, designed to assess the likelihood of observing the difference in spending due to random chance, resulted in a p-value of 0.871. This further confirms that the observed difference in mean spending of approximately -55.29 rupees (females spending slightly more than males on average) is likely due to random variation and not a true underlying difference in spending behaviour.

Conclusion:

Based on the results of these tests, there is no evidence to suggest that males and females differ significantly in terms of spending behaviour in this dataset. The high p-values across all tests indicate that the observed differences in mean spending are likely due to chance, and we cannot conclude that gender has a significant impact on spending in this sample. Therefore, the initial hypothesis that males might spend more than females (or vice versa) is not supported by the data.